Entropy Based Feature Selection For Multi-Relational Naïve Bayesian Classifier
نویسندگان
چکیده
Current industries data’s are stored in relation structures. In usual approach to mine these data, we often use to join several relations to form a single relation using foreign key links, which is known as flatten. Flatten may cause troubles such as time consuming, data redundancy and statistical skew on data. Hence, the critical issues arise that how to mine data directly on numerous relations. The solution of the given issue is the approach called multi-relational data mining (MRDM). Other issues are irrelevant or redundant attributes in a relation may not make contribution to classification accuracy. Thus, feature selection is an essential data preprocessing step in multi-relational data mining. By filtering out irrelevant or redundant features from relations for data mining, we improve classification accuracy, achieve good time performance, and improve comprehensibility of the models. We had proposed the entropy based feature selection method for Multi-relational Naïve Bayesian Classifier. We have use method InfoDist and Pearson’s Correlation parameters, which will be used to filter out irrelevant and redundant features from the multi-relational database and will enhance classification accuracy. We analyzed our algorithm over PKDD financial dataset and achieved the better accuracy compare to the existing features selection methods.
منابع مشابه
A New Hybrid Framework for Filter based Feature Selection using Information Gain and Symmetric Uncertainty (TECHNICAL NOTE)
Feature selection is a pre-processing technique used for eliminating the irrelevant and redundant features which results in enhancing the performance of the classifiers. When a dataset contains more irrelevant and redundant features, it fails to increase the accuracy and also reduces the performance of the classifiers. To avoid them, this paper presents a new hybrid feature selection method usi...
متن کاملEffective Discretization and Hybrid feature selection using Naïve Bayesian classifier for Medical datamining
As a probability-based statistical classification method, the Naïve Bayesian classifier has gained wide popularity despite its assumption that attributes are conditionally mutually independent given the class label. Improving the predictive accuracy and achieving dimensionality reduction for statistical classifiers has been an active research area in datamining. Our experimental results suggest...
متن کاملAn Efficient Multi-relational Naïve Bayesian Classifier Based on Semantic Relationship Graph
Classification is one of the most popular data mining tasks with a wide range of applications, and lots of algorithms have been proposed to build accurate and scalable classifiers. Most of these algorithms only take a single table as input, whereas in the real world most data are stored in multiple tables and managed by relational database systems. As transferring data from multiple tables into...
متن کاملNaïve Bayesian Based on Chi Square to Categorize Arabic Data
Text classification is a supervised technique that uses labelled training data to learn the classification system and then automatically classifies the remaining text using the learned system. This paper investigates Naïve Bayesian algorithm based on Chi Square features selection method. The base of our comparisons are macro F1, macro recall and macro precision evaluation measures. The experime...
متن کاملFault diagnosis of gearboxes using LSSVM and WPT
This paper concentrates on a new procedure which experimentally recognises gears and bearings faults of a typical gearbox system using a least square support vector machine (LSSVM). Two wavelet selection criteria Maximum Energy to Shannon Entropy ratio and Maximum Relative Wavelet Energy are used and compared to select an appropriate wavelet for feature extraction. The fault diagnosis method co...
متن کامل